Self-organizing feature map with improved performance by non-monotonic variation of the learning rate

ABSTRACT

The learning rate used for updating the weights of a self-ordering feature map is determined by a process that injects some type of perturbation into the value so that it is not simply monotonically decreased with each training epoch. For example, the learning rate may be generated according to a pseudorandom process. The result is faster convergence of the synaptic weights.

FIELD OF THE INVENTION

[0001] The invention relates to Self-Organizing Feature Maps (SOFM)which are neural networks that transform an input of arbitrary dimensioninto a one or two dimensional discrete map subject to a topological(neighborhood preserving) constraint and more particularly to such SOFMsin which the initial values of the weight vectors are random.

BACKGROUND

[0002] Neural networks occupy a large branch of research in machineintelligence. Artificial neural networks are information-processingdevices inspired by the interconnected, parallel structure of animalbrains. They take the form of software or hardware networks havingcollections of mathematical models that emulate some of the observedcharacteristics of nervous systems and analogies to adaptive biologicallearning. Generally, they are composed of large numbers ofinterconnected processing elements, which can be realized in software orhardware, that are analogous to the neurons of an animal brain. Theconnections between these processing elements are weighted in a fashionbelieved to be analogous to synapses.

[0003] Training a neural network involves making adjustments to the“synaptic” connections that exist between the neurons (i.e., the valuesof the weights). Training is performed by exposure to a set ofinput/output data where the training algorithm iteratively adjusts theconnection weights. These connection weights store the knowledgenecessary to solve specific problems.

[0004] Neural networks are being applied to greater and greater numbersof practical problems, including very complex ones. They areparticularly suited to pattern recognition and classification problemshaving many inputs such as speech recognition, character and signalrecognition, and functional prediction and system modeling where thephysical processes are not understood or are highly complex.

[0005] There are many types of neural networks. Some of the more popularinclude the multilayer perceptron, which is generally trained with thebackpropagation of error algorithm, learning vector quantization, radialbasis function, Hopfield, and SOFM. Some are classified as feedforwardand others as recurrent (i.e., implement feedback) depending on how datais processed through the network. Another feature of neural networks isthe mechanism by which it is trained. Some use a technique calledsupervised training while others are referred to as unsupervised orself-organizing. In supervised training, the network is guided by aninstruction process while in unsupervised algorithms the data isclustered into similar groups based on the attributes that provide theinputs to the algorithms.

[0006] The SOFM or Kohonen artificial neural network is a type ofunsupervised neural network. In unsupervised learning, an untrainedneural network is exposed to examples or input vectors and internalparameters adjusted. In SOFMs, all the neurons of the network receivethe same input. The nodes engage in competition with their neighbors andat each stage of a self-learning process, the one with most activity“wins.” Learning is based on the concept of winner neurons.

[0007] Unsupervised learning allows the objects to be grouped togetheron the basis of their perceived closeness in n-dimensional hyperspace(where n is the number of variables or observations made on eachobject). Such methods, then, although in some sense quantitative, arebetter seen as qualitative since their chief purpose is merely todistinguish objects or populations.

[0008] Referring to FIG. 1, SOFMs provide an objective way ofclassifying data through self-organizing networks of artificial neurons.There are two layers, an input layer 110 and a competition layer 100.Each node of the input layer may be connected (as indicated byconnectors 120) to the entire set of nodes in the competition layer. Inan example configuration, each neuron may be connected to its eightnearest neighbors on a grid. The neurons store a set of weights (aweight vector) each of which corresponds to one of the inputs in thedata. The objective of a Kohonen network is to map input vectors(patterns) of arbitrary dimension N onto a discrete map lying in acompetition layer of arbitrary dimension, but typically of 1 or 2dimensions. The algorithm adjusts weights so that patterns close to oneanother in the input space should be close to one another in the map:they should be topologically ordered.

[0009] The learning process is as follows: First, the weights for eachoutput unit are initialized, typically to random starting values. Aniterative process is performed that end when weight changes arenegligible. For each of a number of input patterns a winning output nodeand all units in the neighborhood of the winner are identified and theweight vectors for all of these nodes updated. The winning output unitis simply the unit with the weight vector that has the smallestEuclidean distance to the input pattern. The neighborhood of a unit isdefined as all units within some distance of that unit on the map (notin weight space). If the size of the neighborhood is 1 then all units nomore than 1, either horizontally or vertically, from any unit fallwithin its neighborhood. The weights of every unit in the neighborhoodof the winning unit (including the winning unit itself) are updated suchthat each unit in the neighborhood is moved closer to the input pattern.As the iterations are performed, the learning rate is reduced. If theparameters are well chosen the final network should capture the naturalclusters in the input data.

[0010] The factor that governs the size of the weight alterations isknown as the learning rate. The adjustments to each item in the weightvector are made in accordance with the following:

ΔW _(i,j)=α(I _(j) −W _(i,j))sin d/(2d)

[0011] where W_(I,j) is the j^(th) weight of the i^(th) node, α is thelearning rate, I_(j) is the j^(th) component of the input vector, and dis the distance between the current node and the winner. The aboveformula is one of a number that are known in the prior art and theinvention to be discussed in subsequent section could make use of it orany other. As mentioned, the process of training continues until thechanges in the weights falls below some predetermined value insuccessive iterations.

[0012] The effect of the “learning rule” (weight update algorithm) is todistribute the neurons evenly throughout the region of n-dimensionalspace populated by the training set. The neuron with the weight vectorclosest to a given input pattern will win for that pattern and for anyother input patterns that it is closest to. Input patterns which allowthe same node to win are then deemed to be in the same group, and when amap of their relationship is drawn a line encloses them. In theresulting map, it is possible to examine closely relationships betweenthe items in the training set and visualize these relationships even forcomplex structures in high-dimensional input spaces.

[0013] There are two phases to the process of generating a solution. Inan initial first stage, in the prior art, the learning rate begins at ahigh value close to unity and is gradually monotonically decreased. Therate of decrease may be exponential, linear, or some other and accordingto the prior art, the particular pattern has not generally been regardedas particularly important. During the initial phase, called the“ordering phase” the topological ordering of the weight vectors takesplace. A long convergence phase follows and is associated withfine-tuning of the map. The learning rate is maintained at low values(well below 0.1, for example) and monotonically and progressivelydecreased with each iteration.

[0014] The quality of a SOFM solution for any given problem is by nomeans assured. Where the input vectors are high-dimensional and complex,results may take a very long time to converge and may even produce poorend results.

SUMMARY OF THE INVENTION

[0015] In a SOFM, rather than reduce the learning rate monotonicallyduring progression of the iteration process, the learning rate isselected in a random or sporadic fashion. It has been foundexperimentally that this increases the rate of convergence in manyinstances. According to the prior art, the learning rate should betime-varying, but decreased monotonically. However, it has beendiscovered that, at least in some contexts, particularly in the case ofhigher dimensional input spaces, sporadic variation of the learningrate, at least during the initial stages, leads to improved performance.

[0016] The invention will be described in connection with certainpreferred embodiments, with reference to the following illustrativefigures so that it may be more fully understood. With reference to thefigures, it is stressed that the particulars shown are by way of exampleand for purposes of illustrative discussion of the preferred embodimentsof the present invention only, and are presented in the cause ofproviding what is believed to be the most useful and readily understooddescription of the principles and conceptual aspects of the invention.In this regard, no attempt is made to show structural details of theinvention in more detail than is necessary for a fundamentalunderstanding of the invention, the description taken with the drawingsmaking apparent to those skilled in the art how the several forms of theinvention may be embodied in practice.

BRIEF DESCRIPTION OF THE DRAWINGS

[0017]FIG. 1 is a diagrammatic representation of a self ordering featuremap according to the prior art and consistent with embodiments of theinvention.

[0018]FIG. 2 is a flow chart representing an algorithm for implementingthe invention according to an embodiment thereof.

[0019]FIG. 3 is an illustration of selective random generation oflearning rate parameters with progress of training of a SOFM.

[0020]FIG. 4 is an illustration of another manner of selective randomgeneration of learning rate parameters with progress of training of aSOFM.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0021] Referring to FIG. 2, an algorithm for implementing the inventionbegins with the initialization of the weight vectors in step S10. Randomvalues for all of the weight vectors is a typical choice. In step S20, asample input vector is drawn at random from a pool of input vectorsbeing used for training the SOFM. The selected vector is applied to theinput nodes and a winning competition layer node is identified in stepS30 according to a minimum Euclidean distance value:$D_{i} = {\sqrt{\quad}{\sum\limits_{j = 1}^{n}\left( {I_{j} - W_{i,j}} \right)^{2}}}$

[0022] where W_(I,j) is the j^(th) weight of the i^(th) node, I_(j) isthe j^(th) component of the input vector, and D_(i) is the distance ofthe i^(th) node. The node with the lowest distance value is the winner.In step S40, a random value of the learning rate is generated. Thislearning rate is used to update the weights of the winning node andneighboring nodes.

[0023] In step S50, the winning node and all neighborhood nodes are thenupdated according to the following formula.

ΔW _(i,j)=α(I _(j) −W _(i,j))sin d/(2d)

[0024] where W_(I,j) is the j^(th) weight of the i^(th) node, α is therandom learning rate, I_(j) is the j^(th) component of the input vector,and d is the distance between the current node and the winner. Asmentioned, this process of training continues until the changes in theweights falls below some predetermined value in successive iterations.The test is made in step S60.

[0025] Note that the above formulas are examples for purposes ofdescribing an embodiment. There are other formulas that may be used forupdating the weights and the invention is not limited to any particularone.

[0026] Referring now to FIG. 3, the generation of successive values ofthe learning rate can be performed in various ways consistent with thepresent invention. Preferably, the values over which the random ratesrange should become smaller as the simulation progresses. In one exampleembodiment illustrated in FIG. 3, the learning rate is a random valuebetween bounds 161 and 162 which decrease gradually as the trainingprocess progresses. This need not be a monotonic reduction in range, asillustrated at 160 in FIG. 3, but the learning rates preferably getsmaller in value, and range over a smaller range, as the simulationprogresses. In another embodiment, the learning rate is varied in asimilar range 160, but is varied cyclically or pseudorandomly.Preferably, the learning rate varies around unity initially and falls tovalues near zero decreasing by several orders of magnitude.

[0027] Referring to FIG. 4, also, another alternative for generating maypermit random variation (illustrated at 170) of the learning rate duringthe ordering phase of the training and switchover to monotonic reduction(illustrated at 180) of the learning rate for the convergence phase.

[0028] Although particular embodiments of the present invention havebeen shown and described, it will be understood that it is not intendedto limit the invention to the preferred embodiments and it will beobvious to those skilled in the art that various changes andmodifications may be made without departing from the spirit and scope ofthe present invention. Thus, the invention is intended to coveralternatives, modifications, and equivalents, which may be includedwithin the spirit and scope of the invention.

We claim:
 1. A method for training a self ordering map, comprising thesteps of: initializing a set of weights of a self-ordering map;iteratively training said weights over many training epochs; for atleast a number of said epochs, said step of iteratively trainingincluding updating said weights based on a learning rate that isgenerated according to a function that changes in a fashion that isother than monotonically a decreasing value with training epoch.
 2. Amethod as in claim 1, wherein said step of iteratively training includesupdating said weights based on a learning rate that is generatedaccording to a random or pseudorandom function.
 3. A method as in claim2 wherein said step of iteratively training includes updating saidweights based on a learning rate that is generated according to afunction that is such that values over which said learning rate mayrange decreases with training epoch.
 4. A method as in claim 2 whereinsaid step of iteratively training includes updating said weights basedon a learning rate that is generated according to a function that issuch that values over which said learning rate tend to decrease withtraining epoch.
 5. A method as in claim 1 wherein said step ofiteratively training includes updating said weights based on a learningrate that is generated according to a function that is such that valuesover which said learning rate may range decreases with training epoch.6. A method as in claim 5 wherein said step of iteratively trainingincludes updating said weights based on a learning rate that isgenerated according to a function that is such that values over whichsaid learning rate tend to decrease with training epoch.
 7. A method asin claim 1 wherein said step of iteratively training includes updatingsaid weights based on a learning rate that is generated according to afunction that is such that values over which said learning rate tend todecrease with training epoch.
 8. A method of training a self orderingfeature map, comprising the steps of: choosing a random value forinitial weight vectors; drawing a sample from a set of training samplevectors and applying it to input nodes of said self ordering featuremap; identifying a winning competition node of said self orderingfeature map according to a least distance criterion; adjusting asynaptic weight of at least said winning node; said step of adjustingincluding selecting a value for a learning rate used to update saidsynaptic weight that is based on a function other than one that ismonotonic with training epoch; iteratively repeating said steps ofdrawing, identifying, and adjusting.
 9. A method as in claim 8, whereinsaid step of adjusting includes updating said weights based on alearning rate that is generated according to a random or pseudorandomfunction.
 10. A method as in claim 9 wherein said step of adjustingincludes updating said weights based on a learning rate that isgenerated according to a function that is such that values over whichsaid learning rate may range decreases with training epoch.
 11. A methodas in claim 9 wherein said step of adjusting includes updating saidweights based on a learning rate that is generated according to afunction that is such that values over which said learning rate tend todecrease with training epoch.
 12. A method as in claim 8 wherein saidstep of adjusting includes updating said weights based on a learningrate that is generated according to a function that is such that valuesover which said learning rate may range decreases with training epoch.13. A method as in claim 12 wherein said step of adjusting includesupdating said weights based on a learning rate that is generatedaccording to a function that is such that values over which saidlearning rate tend to decrease with training epoch.
 14. A method as inclaim 8 wherein said step of adjusting includes updating said weightsbased on a learning rate that is generated according to a function thatis such that values over which said learning rate tend to decrease withtraining epoch.